Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic: Refactor logic to perform multiple updates #1702

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

clementguidi
Copy link
Contributor

@clementguidi clementguidi commented May 11, 2023

@namhyung
Copy link
Owner

Please rebase this series. I think it should detect non-standard prototypes like int func().

libmcount/dynamic.c Outdated Show resolved Hide resolved
libmcount/dynamic.c Outdated Show resolved Hide resolved
@azharivs
Copy link

azharivs commented Aug 1, 2023

The previous three comments are now addressed.

else if (mdinfo && !mdinfo->next) /* main binary loaded, no module loaded */
fmd.main_loaded = true;
else
return;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition check is confusing as it sometimes checks code_hmap and sometimes mdinfo. I think you can just call dl_iterate_phdr() always and check the mdinfo list inside to prevent duplication.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are performance implications if we traverse the mdinfo linked list every time we want check for a shared library. Therefore, I decided to add a new static variable 'modules_loaded' which takes this into account and will not run callback if modules are already loaded.

@azharivs azharivs force-pushed the dynamic-runtime-init branch 2 times, most recently from 1752826 to 3b6e832 Compare August 10, 2023 22:02
Copy link
Owner

@namhyung namhyung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay.

if (needs_modules)
hash_size *= 2;
code_hmap = hashmap_create(hash_size, hashmap_ptr_hash, hashmap_ptr_equals);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the first update is for main only and the next one needs modules. I think hash size would be kept only for main, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I spotted that and looked strange to me as well. I'll fix it.


dl_iterate_phdr(find_dynamic_module, &fmd);
modules_loaded = !dl_iterate_phdr(find_dynamic_module, &fmd);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is correct. IIRC the return value is from the (last) callback. Maybe you can set it only if it needs modules. How about this?

if (code_hmap) {
    /* main executable should be loaded already */
    if (!needs_modules || modules_loaded)
        return;

    /* maybe increase the hash map size */
}
else {
    if (needs_modules)
        hash_size *= 2;
    code_hmap = hashmap_create(...);
}

dl_iterate_phdr(...);

main_loaded = true;
if (needs_modules)
    modules_loaded = true;

@azharivs azharivs force-pushed the dynamic-runtime-init branch 3 times, most recently from 7830bfa to 68993aa Compare August 29, 2023 19:57
else if (!fmd->needs_modules)
return main_loaded;
else if (modules_loaded)
return 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it return 1? Anyway, I think we don't call this function if modules_loaded set already.

Copy link

@azharivs azharivs Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement will be executed if
the current shared object is NOT the executable AND we need modules AND modules are loaded
I had this because of the original way I had the logic working, at this point it is irrelevant espacially if we don't check the return status of dl_iterate. However, I would not like to speculate and play it safe so I decided not to return an error code here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prepare_dynamic_update() now checks modules_loaded and don't call dl_iterated_phdr() if it's set. So I guess the return value doesn't matter.

libmcount/dynamic.c Outdated Show resolved Hide resolved

code_hmap = hashmap_create(hash_size, hashmap_ptr_hash, hashmap_ptr_equals);
int rc = dl_iterate_phdr(find_dynamic_module, &fmd);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it always return 0 when needs_modules is true. On error cases, it'd exit.. then it doesn't need to check the return value, right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, but checking return code to be on the safe side. I am not sure about how dl_iterate_phdr is implemented. We will be able to catch a case where dl_iterate returns non-zero for some other reason not covered by the call back. Maybe that never happens, and may be it will?!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man page says it'd return the value from the last callback.

@namhyung
Copy link
Owner

Also, can you please rebase onto the current master when you update?

@azharivs
Copy link

azharivs commented Sep 1, 2023

rebased to namhyung:master

Copy link
Owner

@namhyung namhyung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested this PR? I'm seeing new test failures

$ make -j8 runtest TESTARG=dynamic
  TEST     test_run
Start 15 tests with 8 worker

Compiler                  gcc                                           clang                                       
Runtime test case         pg             finstrument-fu fpatchable-fun  pg             finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os  O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
136 dynamic             : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
138 kernel_dynamic      : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
139 kernel_dynamic2     : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
140 dynamic_xray        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
223 dynamic_full        : NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ  NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ
224 dynamic_lib         : NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ  NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ
225 dynamic_size        : NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ  NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ
232 dynamic_unpatch     : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
233 dynamic_unpatch2    : NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ  NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ NZ
248 dynamic_dlopen      : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
263 patchable_dynamic   : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
264 patchable_dynamic2  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
265 patchable_dynamic3  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
266 patchable_dynamic4  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
267 patchable_dynamic5  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK


runtime test stats
====================
total   450  Tests executed (success: 44.44%)
  OK:   200  Test succeeded
  OK:     0  Test succeeded (with some fixup)
  NG:    30  Different test result
  NZ:   120  Non-zero return value
  SG:     0  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:     0  Build failed
  LA:     0  Unsupported Language
  SK:   100  Skipped

else if (!fmd->needs_modules)
return main_loaded;
else if (modules_loaded)
return 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prepare_dynamic_update() now checks modules_loaded and don't call dl_iterated_phdr() if it's set. So I guess the return value doesn't matter.


code_hmap = hashmap_create(hash_size, hashmap_ptr_hash, hashmap_ptr_equals);
int rc = dl_iterate_phdr(find_dynamic_module, &fmd);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The man page says it'd return the value from the last callback.

@namhyung
Copy link
Owner

namhyung commented Sep 2, 2023

Current master has fewer failures:

Compiler                  gcc                                           clang                                       
Runtime test case         pg             finstrument-fu fpatchable-fun  pg             finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os  O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
136 dynamic             : OK OK OK OK OK SK SK SK SK SK NG NG NG NG NG  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
138 kernel_dynamic      : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
139 kernel_dynamic2     : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
140 dynamic_xray        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
223 dynamic_full        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
224 dynamic_lib         : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
225 dynamic_size        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
232 dynamic_unpatch     : OK OK OK OK OK SK SK SK SK SK NG NG NG NG NG  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
233 dynamic_unpatch2    : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
248 dynamic_dlopen      : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
263 patchable_dynamic   : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
264 patchable_dynamic2  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
265 patchable_dynamic3  : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
266 patchable_dynamic4  : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
267 patchable_dynamic5  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK


runtime test stats
====================
total   450  Tests executed (success: 62.22%)
  OK:   280  Test succeeded
  OK:     0  Test succeeded (with some fixup)
  NG:    70  Different test result
  NZ:     0  Non-zero return value
  SG:     0  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:     0  Build failed
  LA:     0  Unsupported Language
  SK:   100  Skipped

@namhyung
Copy link
Owner

namhyung commented Sep 2, 2023

Actually v0.14 has no failures:

Compiler                  gcc                                           clang                                       
Runtime test case         pg             finstrument-fu fpatchable-fun  pg             finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os  O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
136 dynamic             : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
138 kernel_dynamic      : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
139 kernel_dynamic2     : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
140 dynamic_xray        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
223 dynamic_full        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
224 dynamic_lib         : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
225 dynamic_size        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
232 dynamic_unpatch     : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
233 dynamic_unpatch2    : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
248 dynamic_dlopen      : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
263 patchable_dynamic   : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
264 patchable_dynamic2  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
265 patchable_dynamic3  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
266 patchable_dynamic4  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
267 patchable_dynamic5  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK


runtime test stats
====================
total   450  Tests executed (success: 77.78%)
  OK:   350  Test succeeded
  OK:     0  Test succeeded (with some fixup)
  NG:     0  Different test result
  NZ:     0  Non-zero return value
  SG:     0  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:     0  Build failed
  LA:     0  Unsupported Language
  SK:   100  Skipped

@honggyukim
Copy link
Collaborator

The following errors are fixed by #1799, but not merged yet.

265 patchable_dynamic3  : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG
266 patchable_dynamic4  : NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG  NG NG NG NG NG NG NG NG NG NG NG NG NG NG NG

azharivs and others added 9 commits March 11, 2024 10:11
This is in preparation of runtime dynamic patching. This commit
guarantees that dynamic info is read only once for the target binary and
for each module.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
Signed-off-by: Seyed-Vahid Azhari <[email protected]>
If 'mcount_dynamic_update' is called multiple times (e.g. at runtime),
it initializes the size filter only once.

Signed-off-by: Clément Guidi <[email protected]>
Skip the initialization of the disassembly engine with it has already
been performed.

Signed-off-by: Clément Guidi <[email protected]>
After patching, clear the pattern list for reuse. However, archive its
content in a separate aggregated list. This list contains all applied
patterns and is used applied again when when opening a dynamic library.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
The 'mcount_dynamic_update' is now safe to call multiple times,
including at runtime. On each call, it will perform patching and
unpatching of the target (not implemented yet).

Signed-off-by: Clément Guidi <[email protected]>
Install a trampoline for each loaded module map, on initialization. Keep
the trampolines in memory to allow for dynamic patching at runtime.
Clear the trampolines on libmcount exit.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
Use a global flag to indicate the state of the target. When the target
is not running, tasks such as dynamic patching can be performed with
less constraints.

If libmcount.so is dynamically injected (not implemented yet), the
'mcount_target_running' flag indicates that libmcount has to be
initialized in a running target.

Signed-off-by: Clément Guidi <[email protected]>
Trigger architecture specific dynamic initialization when initializing
the dynamic instrumentation mechanics.

Co-authored-by: Gabriel-Andrew Pollo-Guilbert <[email protected]>
Signed-off-by: Clément Guidi <[email protected]>
Don't save instructions if they are already present in the code hmap.

Signed-off-by: Clément Guidi <[email protected]>
@namhyung
Copy link
Owner

Thanks for the update, but now I'm seeing some segfaults in the dynamic_dlopen test.

$ make runtest TESTARG=dynamic
...
  TEST     test_run
Start 15 tests with 8 worker

Compiler                  gcc                                           clang                                       
Runtime test case         pg             finstrument-fu fpatchable-fun  pg             finstrument-fu fpatchable-fun
------------------------: O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os  O0 O1 O2 O3 Os O0 O1 O2 O3 Os O0 O1 O2 O3 Os
136 dynamic             : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
138 kernel_dynamic      : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
139 kernel_dynamic2     : SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
140 dynamic_xray        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
223 dynamic_full        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
224 dynamic_lib         : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
225 dynamic_size        : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
232 dynamic_unpatch     : OK OK OK OK OK SK SK SK SK SK OK OK OK OK OK  SK SK SK SK SK SK SK SK SK SK SK SK SK SK SK
233 dynamic_unpatch2    : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
248 dynamic_dlopen      : OK SG SG SG SG OK SG SG SG SG OK SG SG SG SG  OK SG SG SG SG OK SG SG SG SG OK SG SG SG SG
263 patchable_dynamic   : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
264 patchable_dynamic2  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
265 patchable_dynamic3  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
266 patchable_dynamic4  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK
267 patchable_dynamic5  : OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK  OK OK OK OK OK OK OK OK OK OK OK OK OK OK OK


runtime test stats
====================
total   450  Tests executed (success: 72.44%)
  OK:   326  Test succeeded
  OK:     0  Test succeeded (with some fixup)
  NG:     0  Different test result
  NZ:     0  Non-zero return value
  SG:    24  Abnormal exit by signal
  TM:     0  Test ran too long
  BI:     0  Build failed
  LA:     0  Unsupported Language
  SK:   100  Skipped

@clementguidi
Copy link
Contributor Author

I'm investigating the segfaults. I added a commit to maintain an aggregated list of patch patterns, to apply it to DLLs, which is causing the problem I think.

@clementguidi
Copy link
Contributor Author

The dynamic_dlopen is passing on my machine. I cannot reproduce the failure. Could you please share more details?

@namhyung
Copy link
Owner

Sorry for the late reply.

$ make runtest TESTARG='-vpO2 -c gcc 248'
  TEST     test_run
Start 1 tests without worker pool

Compiler                  gc
Runtime test case         pg
------------------------: O2
build command for library: gcc -o libabc_test_lib.so -fno-inline -fno-builtin -fno-ipa-cp -fno-omit-frame-pointer -D_FORTIFY_SOURCE=0   -O2  -fno-ipa-sra  -shared -fPIC s-lib.c   
build command for library: g++ -o libfoo.so -fno-inline -fno-builtin -fno-ipa-cp -fno-omit-frame-pointer -D_FORTIFY_SOURCE=0   -O2  -fno-ipa-sra  -shared -fPIC s-libfoo.cpp   
build command for executable: gcc -o t-dlopen -fno-inline -fno-builtin -fno-ipa-cp -fno-omit-frame-pointer -D_FORTIFY_SOURCE=0   -O2  -fno-ipa-sra  s-dlopen.c    -Wl,-rpath,$ORIGIN -L.  -ldl
test command: /home/namhyung/project/uftrace/uftrace live --no-pager --no-event --libmcount-path=/home/namhyung/project/uftrace   -P. [email protected] -P.@libabc_test_lib.so -N memcpy t-dlopen
WARN: Segmentation fault: invalid permission (addr: 0x7f07b9f3c110)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.15.2-38-gc4ba ( x86_64 dwarf python3 tui perf sched dynamic kernel )
WARN: =====================================
WARN: [1] (<7f07b9f3c115>[7f07b9f3c115] <= main[559a9a13b100])
WARN: [0] (main[559a9a13b085] <= __libc_start_call_main[7f07b9d766ca])
WARN: child terminated by signal: 11: Segmentation fault
248 dynamic_dlopen      : SG

@namhyung
Copy link
Owner

Running with gdb shows this.

Thread 2.1 "t-dlopen" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7c60840 (LWP 465665)]
0x00007ffff7f63117 in foo (n=1) at /home/namhyung/project/uftrace/tests/s-libfoo.cpp:14
14		a = n;

It seems the value of rax register was changed.

(gdb) list
9	};
10	
11	extern "C" {
12	void foo(int n)
13	{
14		a = n;
15		AAA::bar(n);
16	}
17	}
(gdb) p &a
$1 = (volatile int *) 0x7ffff7f66014 <a>
(gdb) disas
Dump of assembler code for function foo(int):
   0x00007ffff7f63110 <+0>:	call   0x7ffff7f63ff0
   0x00007ffff7f63115 <+5>:	nop
   0x00007ffff7f63116 <+6>:	nop
=> 0x00007ffff7f63117 <+7>:	mov    %edi,(%rax)
   0x00007ffff7f63119 <+9>:	jmp    0x7ffff7f63030 <_ZN3AAA3barEi@plt>
End of assembler dump.
(gdb) p/x $rax
$2 = 0x7ffff7f63110

@clementguidi
Copy link
Contributor Author

I'm still not able to reproduce the failure. Do you mind sharing the environment you're using? Maybe I can run a VM with the same config?

@namhyung
Copy link
Owner

It's a custom debian (testing) with some company internal changes. But I think it depends on toolchains.

$ gcc --version
gcc (Debian 13.2.0-10) 13.2.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@namhyung
Copy link
Owner

Ok, I think I found the problem. It seems dynamic tracing on dlopen() is broken already. It's because of code_hmap doesn't handle functions with dlopen correctly. I think we should split the hashmap for initial code/libraries and for dlopen. The original hash map should remain in lockless and it can try the second hash map only it couldn't find an entry in the original map. The new dlopen hashmap should be protected by a lock and remove the matching entries when dlclose() is executed.

In my environment, it caused the problem only if I trace libabc_test_lib and libfoo together. And I found that dlopen loaded the two libraries at an exactly same address. So when the second library patches the function, it finds the entry in the hashmap and won't add a new entry. Then later when it executes the function, it finds an entry for the previous function (which is unloaded already) and run the wrong code... boom!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants